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THE  THEORY  OF  DYWAHIC  PROOHAMMINO 
Hlehard  BtllMn 


§1 .  Introduction 

Before  turning  to  a  discussion  of  soae  representative 
problems  which  will  persiit  us  to  exhibit  v'>rious  mathematical 
features  of  the  theory,  let  ua  praaant  a  brief  survey  of  the 
fundamental  concepts,  hopes,  and  aspirations  of  dynamic  programming. 

To  begin  with,  the  theory  was  created  to  treat  the  ms  the- 
metical  problems  arising  from  the  study  of  various  multi-stage 
decision  processes,  which  stay  roughly  be  described  in  the  fol¬ 
lowing  way:  We  h*ve  a  physical  system  whose  st^te  at  <»ny  time  t 
is  determined  by  a  aet  of  quantities  which  we  call  state  para¬ 
meters,  or  state  variables.  At  certain  times,  which  m-y  be  pre¬ 
scribed  in  advance,  or  which  may  be  determined  by  the  process 
Itself,  we  are  called  upon  to  make  decisions  which  will  affect 
the  state  of  the  system.  These  decisions  are  equivalent  to 
transfonnations  of  the  state  variables,  the  choice  of  a  decision 
being  identical  with  the  choice  of  a  transfomation.  The  out¬ 
come  of  the  preceding  decisions  is  to  be  used  to  guide  the  choice 
of  future  ones,  with  the  purpose  of  the  whole  process  that  of 
maximizing  sosm  function  of  the  parameters  describing  the  final 
state. 

Examples  of  processes  fitting  this  loose  description  are 
furnished  by  virtually  every  phase  of  modem  life,  from  the  plan¬ 
ning  of  industrial  production  lines  to  the  scheduling  of  patients 
at  a  medical  clinic;  from  the  determination  of  long-term 
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Investnent  programs  for  universities  to  the  determination  of  a 
replaceiaent  policy  for  machinery  In  factories;  from  the  program¬ 
ming  of  training  policies  for  skilled  and  unskilled  l-^bor  to 
the  choice  of  optimal  purchasing  and  Inventory  policies  for 
department  stores  and  military  establishments. 

It  Is  abundantly  clear  from  the  very  brief  description  of 
possible  applications  that  the  problems  arising  from  the  study 
of  these  processes  are  problems  of  the  future  as  well  as  of  the 
limaedlate  present. 

Turning  to  a  more  precise  discussion,  let  us  Introduce  a 
small  amount  of  terminology.  A  sequence  of  decisions  will  be 
called  a  policy,  and  a  policy  which  Is  m^st  advantageous  accord¬ 
ing  to  some  preassigned  criterion  will  be  called  an  optima  1 
policy. 

The  classical  approach  to  the  m'> ‘ h*»mf' t lc»  1  problems  arising 
from  the  processes  described  above  Is  to  consider  the  set  of  all 
possible  sequences  of  decisions,  which  Is  to  say,  rhe  set  of 
all  feasible  policies,  compute  the  return  from  each  such  feasible 
policy,  and  then  maximise  the  return  over  the  set  of  '•11  feasible 
policies. 

It  Is  evident  that  straightforward  and  reasonable  as  such 
a  procedure  Is,  It  Is  often  not  practical.  For  processes  Involv¬ 
ing  even  a  moderate  number  of  stages  and  '»  moderate  r^^nge  of 
choices  at  each  stage,  the  dimension  of  the  resultant  maxlmlz«»- 
tlon  problem  will  be  uncomfortably  high,  with  continuous  processes 
requiring  maximization  over  function  space. 
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If  w«  ■o«ent«rlly  r«-«xaiilne  the  situation,  not  aa  a 
mathe«atielan ,  but  aa  a  "practical  man,"  we  see  that  this  price 
of  excessive  dlaenalonallty— a  price  that  occasionally  stakes 
even  a  aodem  computing  machine  crlnge-^rlses  from  a  demand 
for  too  much  Information.  How  much  Information  Is  actually 
required  to  carry  out  a  multl-atage  decision  process? 

Do  we  require  a  knowledge  of  the  complete  sequence  of 
decisions,  those  to  be  perfoimed  at  the  present  stage,  those  at 
the  next  stage,  and  so  on?  Not  at  nil!  It  Is  sufficient  to 
furnish  a  general  prescription  which  determines  nt  «‘ny  stage  the 
decision  to  be  made  In  tenia  of  the  current  state  of  the  system. 
In  other  words.  If  at  any  particular  time  we  know  what  to  do,  It 
Is  never  necessary  to  know  the  decisions  required  *‘t  subsequent 
times. 

Donning  our  laathematlcal  cap  again,  we  see  that  this  c  xarnon^ 
sense  attitude  reduces  the  dimension  of  the  problem  to  Its  proper 
level,  namely  the  dimension  of  the  decision  problem  that  con¬ 
fronts  one  at  any  particular  time. 

For  the  case  of  deterministic  processes,  which  Is  to  say, 
those  where  the  Initial  state  and  the  declalon  uniquely  detenilne 
the  outcome,  both  viewpoints  are  possible.  For  the  case  of 
stochastic  processes,  where  a  decision  determines  only  a  distri¬ 
bution  of  outcome  states,  the  classical  enumeratlve  approach  Is 
vlrttjally  Impossible. 


The  Pundaaenttl  Approach 

As  stated  above,  the  basic  Idea  of  the  theory  of  dynaaie 
programing  Is  that  of  viewing  an  optlaal  policy  as  one  deter- 
Mining  the  decision  required  at  each  tiae  in  tents  of  the  cur¬ 
rent  state  of  the  systeii.  Following  this  line  of  thought,  the 
basic  functional  equations  given  below  describing  the  quantita¬ 
tive  aspects  of  the  theory  are  unlfonily  obtained  from  the  fol¬ 
lowing  Intuitive 

Principle  of  Optiaality:  An  optlaal  policy  has  the  property  that 
whatever  the  initial  state  and  initial  decisions  are,  the  reaain- 
Ing  decisions  aust  constitute  an  optiaal  policy  with  reg^^rd  to 
the  state  resulting  from  the  first  decisions. 

The  functional  equations  we  shall  derive  are  of  a  difficult 
and  fascinating  type,  wholly  different  froa  any  encountered  pre¬ 
viously  In  analysis.  Nonetheless,  as  we  shall  see  below,  they 
may  be  utilized  to  provide  an  entirely  new  approach  to  soae  clas¬ 
sical  problems. 

^3*  Mathematical  Formulation— 1 :  A  Discrete  Deterainistic  Process 

To  Illustrate  the  type  of  functional  equ-tlon  that  "rises 
from  an  application  of  the  principle  of  optimality,  let  us  begin 
with  the  simplest  case  of  a  detenslnistlc  process  where  the  sys¬ 
tem  Is  described  at  any  time  by  an  M-diaensional  vector 
P  •  (Pi  .P# » •  •  •  »Pj(|)  *  constralr.ted  to  11^  within  some  region  D. 

Let  T  -  where  k  runs  over  a  set  which  may  be  finite,  enu¬ 

merable,  or  continuous,  be  a  set  of  transformations  with  the 
property  that  p€D  implies  that  T|j(p)tD  for  all  k. 


Let  US  sssuae  that  we  are  considering  an  N-stage  proceas 
to  be  carried  out  to  maximize  some  scalar  function*  R(p)  of  the 
final  state.  We  shall  call  this  function  the  N-^tage  return. 

A  policy  consists  of  a  selection  of  N  transformations* 

P  —  (Ti *Tt • • • • *T^) •  yielding  successively  |Im  states 


Pi  -  Ti(p). 
Pa  -  T»(pi)* 


if 

If  D  is  a  finite  region*/eaeh  Tj^(p)  is  continuous  in  p*  and 
R(p)  is  a  continuous  function  of  p  for  p£P*  it  is  clear  that  an 
optimal  policy  exists.  The  maximum  value  of  R(p||)»  detenained  If 
an  optimal  policy*  will  be  a  function  only  of  the  Initial  vector  i 
and  the  number  of  stages  N.  Let  us  then  define 


(2)  fy.(p)  -  M|x  P(pj,) 


•  the  N-«tage  return  obtained  using  '^n  optimal 
policy  starting  from  the  initial  state  p. 


To  derive  a  functional  equation  for  P^^Cp)*  employ  the 
principle  cited  above.  Assume  that  we  choose  some  trsnsfoiemtion 
T^  as  a  result  of  our  first  decision*  obtaining  thereby  a  new 
state  T^(p)*  The  maximum  return  from  the  following  (N-1)  stages 
is*  by  definition*  (Tjj(p) ) .  It  follows  th^t  k  must  now  be 


chosen  so  as  to  Baxlalz#  this.  The  result  Is  the  basic  func- 
tlonsl  equation 

(3)  “  "5*  N-2.?.”-  • 

It  la  clear  that  a  knowledge  of  any  particular  optimal 
policy*  not  necessarily  unique*  will  yield  fj^Cp)*  idiich  Is 
unique.  Conversely,  given  the  sequence  optimal 

policies  may  be  detensined. 

We  thus  have  a  duality  between  the  space  of  functions  and 
the  space  of  policies  which  Is  of  great  theoretical  and  compu¬ 
tational  Importance.  This  point  will  be  discussed  again  below. 

S ^ •  Wathematlcal  Formulation— 11 :  Discrete  Stochastic  Case 

Let  us  now  consider  the  esse  where  the  transformations  are 
stochastic  rather  than  detenalnlstlc .  A  choice  of  a  transfonaa— 
tlon  T)(  now  yields  a  stochastic  vector  z  as  the  new  state  vec¬ 
tor  with  an  associated  vector  distribution  function  dG^(p,z). 

It  Is  clear  that  It  Is  now  In  general  meaningless  t?  speak 
of  maximizing  the  return.  We  must  agree  to  measure  the  value  of 
a  policy  In  terms  of  some  average  value  of  the  function  of  the 
final  state.  Let  us  call  this  expected  value  the  N-atage  return. 

We  now  define  fj^(p)  before  In  terms  of  the  N-etage  return. 
If  z  Is  the  state  resulting  from  any  Initial  transformation  Tj^* 
the  return  from  the  last  (N-l)  stages  will  be  f^^  ^(x).  The 
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txptet«d  r«tum  at  a  raault  of  tha  eholea  of  T|^  Is 

0 

J  fj^_j(i)<JO|j(p,«) 


(1) 


sdD 


Hcnect  the  functional  equation  for  fy(p)  la 


(2) 


C 

f^(p)  -  Mg*  fj._j(*)dO(p.a), 


Not#  that  tha  dataminlatle  pr'>eaas  Bay  ba  eonaldarad  to  ba 
■araly  a  particular  eaaa  of  a  atoehaatle  proeaaa. 

^3*  Mathattlcal  Ponrolation— Hit  Infinlta  Stochastic  Procasa 
For  Bathaafttleal  purposaa,  it  Is  fraquantly  usafXil  to  eon- 
sldar  tha  fictitious  Infinlta  process  In  which  thara  nra  an 
unboundad  nuabar  of  stagaa.  In  that  cssee  tha  saquanca  f)j(p) 

Is  replaced  by  tha  single  function  f(p)  «  f^^Cp)*  and  tha  forasl 
equivalent  of  (3*P)  Is 


(1) 


f(p)  -  ftax 

^  xCD 


C 

^  f(s)dOi,(Pe 


*) 


^6.  Hathasatlcal  Ponmlatlon— >1V;  Continuous  DetarBlnlstlc 

Process 

If  wa  consider  a  continuous  process  where  a  decision  snist 
ba  Bade  at  each  point  of  a  tlsia  Interval,  wa  are  lad  to  aixl- 
Blzatlon  problaBS  over  function  spaces.  Tha  slBplest  exanplas 
of  these  probleBS  are  funnlshad  by  tha  calculus  of  variations. 
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As  vs  shall  show  bslo«»  our  spproseh  leads  to  s  new  view  of 
this  elssslcsl  theory. 

Definlnt 

(1)  r(p;T)  •  the  return  obtained  over  a  tiae  interval  0,T 

using  an  optiaal  policy  starting  from  an 
initial  state  p 

the  analogue  of  the  functional  equation  of  (3*3)  is 
(8)  f(p;S.T)  -  f(T-(p);T) 

0|5.a  ® 

Where  the  naxiBin  is  taken  over  all  allowable  deciaions  made 
over  the  initial  interval 

As  soon  as  we  consider  infinite  processes,  we  are  confronted 
by  the  difficulty  of  showinur  that  the  maximum  is  actually  attained. 
Consequently,  in  general,  we  must  initially  replace  (6.2)  by 
the  rigorous  equation 

(3)  f(p;SeT)  -  Sup  f(T  (p);T) 

and  then  show,  under  various  assumptions,  that  the  extremum  is 
attained. 

As  will  be  shown  below,  the  limiting  form  of  (6.3)  as 
S  -  — >  0  yields  a  partial  differential  equation. 

We  shall  not  discuss  here  the  corresponding  problem  for  the 
case  of  stochastic  processes  since  a  number  of  interesting  and 
difficult  conceptual  questions  arise  which  have  not  as  yet  been 
fully  resolved. 
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^7.  Sowe  Examples— I:  An  Allocation  Problea 

Before  proceedlnc  •ny  further  with  our  general  discussion, 
let  us  Illustrate  these  Ideas  by  aeans  of  s  number  of  examples, 
of  both  stochastic  and  deterministic  type,  which  are  repre¬ 
sentative  of  the  types  of  problems  which  fall  within  the  domain 
of  the  general  theory. 

Problem  1.  We  are  given  a  quantity  x  >  0  thnt  may  be  divided 
Into  two  non-oegatlve  parts,  y  and  x-y.  From  y  we  obtain  a 
return  of  g(y),  at  the  expense  of  reducing  y  to  ay  where 
0  <  a  <  1;  from  x-y  we  obtain  a  return  of  h(x-y)  at  the  expense 
of  reducing  x-y  to  b(x-y)  where  0  <  b  <  1.  The  process  Is  now 
repeated  with  the  new  Initial  quantity  ay  4  b(x-y),  and  so  on 
Indefinitely.  How  does  one  allocate  at  each  stage  so  as  to 
maximize  the  total  return  obtained  over  the  entire  process? 

This  Is  a  very  simple  prototype  of  a  I'^rge  class  of  Impor¬ 
tant  allocation  and  Investment  problems  which  occur  In  a  number 
of  diverse  activities. 

Let 


(1)  f(x)  •  the  total  return  obtained  employing  an 

optimal  policy. 


Arguing  as  above.  It  Is  readily  seen  that  f(x)  satisfies  the 
functional  equation 

f(x)  -  Sup  r g(y)  ♦  h(x-y)  ♦  f(ay  4  b(x-y))  1  ,  x  >  0 
0^<x  L-  J 

f(0)  -  0 


(2) 
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Por  a  discussion  of  the  various  waya  In  which  this  equa¬ 
tion  can  arise,  and  soae  of  the  analytic  results  which  csn  be 
obtained,  we  refer  the  reader  to  ,  jj] ,  DG*  D3  • 

Treatment  of  the  closely  related  optlmol  Inventory  problsai 
■ay  be  found  In  [2],  »  Q3« 

38.  Soiae  Examples— II ;  Stochastic  Gold  Wining 
Let  us  now  consider  the  following  example: 

Problew  2.  We  are  fortunate  enough  to  possess  two  gold  mines. 
Anaconda  and  Bonanza,  and  a  sensitive  gold-mining  machine  with 
the  following  characteristics:  If  the  machine  Is  used  In 
Anaconda,  It  will  nine,  with  probability  p,  a  fixed  fraction  r 
of  the  gold  there  and  be  undamaged;  with  probability  (1— p)  It 
will  mine  nothing  and  be  damaged  beyond  repair.  If  the  machine 
Is  used  In  Bonanza,  It  will  mine,  with  probability  q,  a  fixed 
fraction  s  of  the  gold  there  and  be  undnmaged;  with  probability 
(I-q)  It  will  nine  nothing  and  be  damaged  beyond  repair. 

At  each  stage,  as  long  as  the  machine  Is  undamaged,  we 
have  our  choice  of  using  the  machine  In  Anaconda  -'r  Bonanza. 
Given  the  Initial  amounts,  x  and  y  respectively  In  each  mine, 
what  sequer.ce  of  choices  maximizes  the  expected  amount  mined 
before  the  machine  Is  damaged? 

Let 


(1)  f(x,y)  •  the  expected  amount  of  gold  mined  before  the 

machine  Is  damage-^  using  an  optimal  policy, 
starting  with  x  In  Anaconda  and  y  In  Bonanza. 


It  Is  easily  seen  that  f(K,y)  satisfies  the  functional 


equation 


(2) 


f(x,y) 


♦  f(  (l-r)x,y)3  “ 
qjay  ♦  f(x.(l-s)y)  2  _ 


The  solution  haa  the  f')llowing  simple  structure: 


(3) 


a.  For  prx/(l-r)  >  qsy/d-a),  choose  A 

b.  F^r  prji/(l-r)  <  qay/d-a),  choose  B 

c.  For  prx/(l-r)  •  qsy/(l-e),  choose  either 


Using  this  prescription,  f(x,y)  m'^y  be  computed  recurrently. 
The  boundary  curve  between  the  two  declalona  regions  is  the  locus 
of  points  where  ianediate  expected  gain  over  Iwmedlite  expected 
loss  Is  the  saiae  for  both  choices.  Unfortunately,  as  a  counter* 
example  of  Karlin  and  Shapiro  ^  shows,  this  simple  and  intui* 
tlve  rule  is  not  valid  generally  in  more  complicated  decision 
processes . 

For  a  discussion  of  further  results  and  extensions  of  both 
discrete  and  continuous  type,  see  ,  [5O »  6(1  »  K*  65* 

^ 9 •  Some  Examples — III:  A  Problem  in  the  Calculus  of  Variations 
A  simple  exnraple  of  a  continuous  decision  pr  cess  Is  fur¬ 
nished  by  the  following  probleni  in  the  colculus  of  v''rlatlona: 

Problem  Maximize  J  F(r,y)dt  over  all  y  where  x  and  y  are 

o 

connected  by  the  relation  dx/dt  •  0(x,y),  x(0)  •  c. 
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The  classical  technique  In  th»*  cilculus  of  varl  tl'ns, 
patterned  directly  after  the  technique  us^^d  In  lirl 7a t Ion 

problems  In  f Inl te-<1  Imens  Iona  1  spaces,  consists  of  ''or  sld^rl  g 
the  function  yielding  an  extremum  as  a  p-'ln*  lr»  func*lon  space. 
This  point  la  now  characterised  by  means  of  variational  prper- 
tles,  of  which  the  most  Important  la  the  Euler  equatl  n. 

This  approach  corresponds  to  finding  y  as  a  furictlon  of  t. 
Instead,  we  shall  view  the  problem  as  a  continuous  declslor. 
process  and  seek  to  determine  y  at  any  tlm#»  as  o  functlor,  of  the 
two  state  parameters,  c  and  " . 

Let  us  then  set 

(1)  r(c,T)  •  y.»x  }  P(x.y)dt 

7  O 


We  shall  In  what  follows  proceed  completely  formally,  a.asumlng 
the  maximum  Is  attained,  that  all  functions  hove  the  requisite 
number  of  continuous  derivatives,  and  so  on.  Using  the  principle 
of  optimality,  we  see  that  f(c,T)  satisfies  the  equation 


(2) 


f(c,S-»>T)  -  «ax  ..r  ^  F(x,y)dt  e  J  F(x.y)dtl 

^  ^ 


yCo.C] 


J 


F(x,y)dt  ♦  f(c(S),T) 


where  c(S)  Is  xat  t«S 
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Astunlng  that  y  is  continuous,  we  obtain  after  a  sinple  coai— 
putatlon  the  Halting  form  of  (2)  as  S  - >  0 

(3)  f^  -  Max  QP(e,v)  ♦  G(c.v)fg3 

where  v  •  y(0).  Proceeding  formally,  m  h?"ve  for  the  detenalna- 
tlon  of  the  aaxlisua 


(M  %  ♦  Ov^c  •  0 


Plialnatlng  f  between  (3)  and  (4)  we  obtain  the  first-order 
partial  differential  equation 


(5) 


POy-CF^ 


•)  V  ♦ 
'  V  c 


PO^-OPy 

~ir — 


The  chai*acterlatic8  of  this  equation  lead  directly  to  the  Euler 
equation  obtained  by  the  usual  variational  approach: 


(6) 


'y  ■ 


'k  "y 


The  saae  Is  true  In  the  aultl-^llsienslonal  problen  where  x,y 
and  0(x,y)  are  vectors  and  F(x,y)  Is  a  scalar  function.  The 
case  where  the  Integrand  contains  t  explicitly  can  always  be 
reduced  to  the  above  by  the  Introduction  of  a  new  dependent 
variable. 


If  we  add  to  our  original  problem  a  constraint  such  as 
0  <  y  <  x»  one  which  occurs  frequently  In  connection  with  allo¬ 
cation  and  Investment  problems,  the  functional  equation  Is 
replaced  by 


(7) 


Max  P(c,v)  4>  0{c,v 
0^v<c<*- 


Varlous  conditions  under  which  this  problem  has  a  solutl  n  of 
particularly  slaple  structure  are  given  In  [iTj  .  We  might  note 
In  passing  that  the  difficulty  Induced  by  a  constraint  f  the 
type  above  Is  due  to  the  fact  that  free  variation  Is  not  per¬ 
mitted  whenever  y  has  an  extreme  value  of  0  or  x,  and  c  nsequently 
Inequalities  replace  equalities. 

Further  discussion  of  these  techniques  will  be  found  In  M. 

M  .  Lit)  .  M  • 

^10.  Some  Examples— »]V ;  An  Eigenvalue  Problem 

This  functional-equation  approach  Is  also  appll''able  to 
eigenvalue  problems  associated  with  differential  equa*  Ions  '>f 
the  form 


( 1  /  — ♦  A*  ^  ( t )  u  •  ^ 

u(0)  -  u(l)  -  0 

where  we  are  Interested  In  the  vilaes  of  which  ylel-^  n^n’rlvlal 


solutions  u. 
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Under  suitable  conditions  upon  4(t),  this  problem  Is 


equlvilent  to  thst  of  determining  the  successive  mlr.lsw  of 

aubjeet  th.  conatralnta  t)u*dt  •  I,  u(0)  •  u(l) 


In  order  to  enploj  the  functional  equation,  we  Imbert  the 
problem  within  the  more  general  problem  of  determining  the  suc¬ 
cessive  minima  of 


(u)  •  J'  u 


subject  to  the  constraints 


(a)  u(m)  -  u(a4-t)  •  0, 


(b)  S  i(S)u*dS  ♦  K  i(S)(a^t-S)u(S)dS  -  I 


Writing  '^In  J(u)  •  f(a,k,t),  we  can  '^erlve  a  partial  dlffer- 
u 

entltl  equation  for  f,  which  Is  nonlinear.  Using  th#*  fact  that 
i  may  be  considered  constant,  and  equal  to  smnll  t, 

this  equation  may  be  used  to  determine  the  eigenvalues  computa¬ 
tionally  (see  (i j  , 


^1 1 .  Some  Eaawples — V;  Qames  of  Survival 

As  our  last  example,  let  us  consider  a  particularly  Interest 
Ing  example  of  a  multistage  game,  the  sos-'lled  "game  of  sur— 
vial." 
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Let  us  assume  th^t  two  players,  A  and  B,  are  playing  a 
zero-sum  gam^  determined  by  the  matrix  A  •  1 .  J*' .  1  #  •  •  •  »N, 

and  th-'t  A  starts  Initially  with  an  amount  of  money  x,  and  B 
starts  Initially  with  y.  Both  are  playing  the  gams  wl*h  the 
purpose  of  ruining  the  other.  How  should  both  play’ 

Let  us  dsf ine ,  for  x  and  y  positive 

(1)  f(x,y)  •  the  probability  that  A  ruins  B,  given 

that  A  starts  with  x,  and  B  with  y, 
and  both  play  optimally. 


I*  Is  clear  that  A  wishes 
B  wishes  to  minimize  it. 

For  other  values  of  x  and 


to  auixlmlxe  this  probability  and 


y.  r(x,y)  Is  '•eflned  as  follows: 


(?)  f(*»y)  *0,  X  1  0,  y  >  o 

-If  y  <  0,  X  >  0 


It  Is  now  clear  that  f(x,y)  satisfies  the  functional 
equa  t ion 

N 

(  >  )  r(x  ,y  )  •  Yax  '^In  [&, 


y.ln  Max 
q  P 


[ 


] 


Since  the  to’al  sum  of  money  In  the  game  remilns  constant,  It 
Is  clear  th't  we  can  replace  f(x,y)  by  a  functlor  of  one 
variable,  x. 


^  P-650 
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For  further  developaents ,  we  refer  the  reader  to  , 

Btl  »  •nd  to  a  recent  paper  by  Shapley  [53  • 

^12.  ApproKiaation  in  Policy  Space  and  Monotone  Convergence 

The  functional  equatlona  we  h-^ve  derived  above  are,  In  the 
main,  analytically  Intransigent.  The  theoretical  and  numerical 
properties  of  the  solutions  must  then  be  'derived  by  use  of  th?»i 
general  factotum  of  analysis,  the  method  of  successive  'approxi¬ 
mations.  If  our  functional  equation  has  *he  form 

(1)  f(p)-T(f(p)) 

as  do  those  above,  we  choose  an  initial  function  f^lp),  and 
obtain  a  sequence  of  functions  by  means  of  the  algorithm 

The  physical  background  will  usually  provide  precisely  the  con¬ 
ditions  required  for  geometric  convergence  of  this  sequence  lo 
the  solution  of  (1),  where  the  uniqueness  will  be  equally  gu-ir-'n- 
teed  by  the  easie  conditions.  This  technique  we  call  approxima¬ 
tion  in  function  space. 

Let  us  recall,  however,  that  in  '•  sense  the  function  f(p) 
is  not  of  paramount  Importance.  Rather,  It  is  the  optimal  poli¬ 
cies  which  yield  r(p)  that  are  the  most  important.  1*  follows 
that  it  may  be  wiser  to  approximate  t  '  optimal  policies  rather 
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than  approxinate  directly  to  maxlnutn  returns. 

In  many  ways  this  Is  a  simpler  and  mnre  natural  technique, 
as  well  as  more  practical  In  applications.  Tne  principle  theo¬ 
retical  advantage  lies  In  the  fact  that  we  now  obtain  monotone 
convergence . 

To  Illustrate  this  In  Its  simplest  fomi  let  u.a  con-’l-'er 
the  functional  equation  discussed  In 

(3)  f(x)  •  Max  f  g(y)  ♦  h(x-y)  ♦  r{ay  ♦  b(x-y))l 

0<y<x  *-  J 

Perhaps  the  simplest  Initial  guess  Is  to  assume  that  y  •  0  con¬ 
tinually.  This  yields  as  our  Initial  approximation  t  o  the 
maximum  return  the  function  fgCx)  satisfying  the  functional 
equation 

(^)  ro(x)  -  h(x)  ♦  ro(bx) 

It  Is  now  clear  that  the  function  f»(x)  determined  by 

(5)  fi(*)  •  Max  r  g(y)  ♦  h(x-y)  ♦  ^*0^**^  b(x-y)) 

0^<x  L 

Is  always  greater  than  or  equal  to  f^fx).  Hence,  Inductively, 

If 

I  h(x-y)  ♦  fn("y^b(x-y) )  I  ,  n-0,l 


(6) 


t  • 
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«•  hav« 

and  thus  Monoton#  convergence,  see  QjJ  ,  [8]. 

A  coiapletelj  analog  us  technique  la  applicable  to  con¬ 
tinuous  processes,  and  in  particular  the  calculus  of  vnriatl'^nn. 
The  results  are  particularly  interesting  in  connection  vith 
eigenvalue  probleas  where  we  obtain  aonotone  convergence,  , 

[19 

^13*  Further  Results 

We  have  not  the  space  here  to  discuss  any  of  a  number  :)f 
other  interesting  and  important  problems  in  dynamic  prograimlng. 
For  those  interested  in  bottleneck  problems  occurring  In 
multi-stage  production  processes,  we  refer  to  W.  D"!.  &7]. 
Those  Interested  in  scheduling  problems  may  consult 

[23].  [33]. 

A  number  of  mathematical  problems  occurring  in  connection 
with  the  control  of  engineering  economic  systems  are  discussed 

in  [i<g,  |2l]. 

Finally,  we  should  like  to  mention  a  number  of  papers  c'^n— 
cemed  with  the  very  difficult  mathematical  pr  blems  occurring 
in  the  general  theory  of  learning  processes,  ^5^]  ,  C35]  , 

and  . 
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